Convex Optimization 2015.11.18
Convex Optimization 2015.11.18
Let \(\{ f_a : a \in \mathcal{A} \}\) be a collection of convex functinos from \(\mathbb{R}^n\) to \(\mathbb{R}\), with same domain, then \(f(x) = \sup_{a \in \mathcal{A}} f_a(x)\) is a convex function.
Proof1: Take \(x, y \in dom \, f\), \(\theta \in [0, 1]\),
\[\begin{align*} f(\theta x + (1 - \theta)y) = & \sup_a f_a(\theta x + (1 - \theta)y) \\ \le & \sup_a [\theta f_a(x) + (1 - \theta) f_a(y)] \\ \le & \theta \sup_a f_a(x) + (1 - \theta) \sup_a f_a(y) \\ = & \theta f(x) + (1 - \theta) f(y) \end{align*}\]
(The proposition is true for \(dom \, f = \bigcap _{a \in \mathcal{A}} dom \, f_a\), but false for \(dom \, f = \bigcup _{a \in \mathcal{A}} dom \, f_a\).)
Proof2: \[\begin{align*} epi(f) = & \{ (x, t) : x \in dom \, f, \, t \gt f_a(x) \; \forall a \in \mathcal{A} \} \\ = & \{ (x, t) : (x, t) \in epi(f_a) \; \forall a \in \mathbb{A} \} \\ = & \bigcap _{a \in \mathcal{A}} epi(f_a) \end{align*}\]
Example: Let \(x_{[i]}\) denote the \(i\)-th largest component of \(x = (x_1, \cdots, x_n) \in \mathbb{R}^n\),
then \(max-sum_r(x) = x_{[1]} + x_{[2]} + \cdots + x_{[r]}\) is convex.Proof: \(max-sum_r(x) \ge x_{i_1} + x_{i_2} + \cdots + x_{i_r}\)
for any \(\{ i_1, \cdots, i_r \} \subseteq \{ 1, \cdots, n \}\) with \(i_j \neq i_k\) for \(j \neq k\)
so it is convex.
Example: Let \(C \subseteq \mathbb{R}^n\), define \(S_c(x) = \sup \{ y^T x : y \in C \}\), then \(S_c\) is convex.
Example: Let \(f : S^n \to \mathbb{R}\), \(f(X)\) is the largest eigenvalue of \(X\).
Claim: \(f\) is convex.
Proof: First claim that \(f(X) = \sup \{ y^T X y : \lVert y \rVert _2 = 1 \}\)
Proof of claim: \[\begin{align*} \sup _{\lVert y \rVert _2 = 1} y^T X y = & \sup _{\lVert y \rVert _2 = 1} y^T P D P^T y \\ = & \sup _{\lVert v \rVert _2 = 1} v^T D v \\ = & \sup _{\lVert v \rVert _2 = 1} \lambda _i v_i^2 \\ \le & \sup _{\lVert v \rVert _2 = 1} \max (\lambda _i) \sum_{i = 1}^n v_i^2 \\ = & \max (\lambda _i) \end{align*}\]
Example: Let \(f : \mathbb{R}^{m \times n} \to \mathbb{R}\) be defined by \(f(X) = \lVert X \rVert _2\) where \(\lVert X \rVert _2 = \sup _{\lVert y \rVert _2 = 1} \lVert X y \rVert _2\) is the spectural norm of \(X \in \mathbb{R}^{m \times n}\).
Claim: \(f(X) = \sup _{u, v} \{ u^T X v : \lVert u \rVert _2 = 1, \lVert v \rVert _2 = 1 \}\)
because \(\lVert Xv \rVert _2 = \sup \{ u^T X v : u \in \mathbb{R}^n, \, \lVert u \rVert _2 = 1 \}\)
more generally: \(\lVert X \rVert _{a, b} = \sup \{ \lVert Xv \rVert _b : \lVert v \rVert _a = 1 \} = \sup \{ u^T X v : \lVert v \rVert _a = 1, \lVert u \rVert _{b*} = 1 \}\)
(\(\lVert Xv \rVert _b = \lVert Xv \rVert _{b**} = \sup _u \{ u^T X v : \lVert u \rVert _{b*} = 1 \}\))
Composition: Have \(f(x) = h(g(x))\), \(x \in \mathbb{R}^n\), \(g(x) \in \mathbb{R}\), when is \(f\) convex?
Would-be-proof: Take \(x, y \in dom \, f\), \(\theta \in [0, 1]\) then
\[\begin{align*} f(\theta x + (1 - \theta)y) = & h(g(\theta x + (1 - \theta)y)) & \text{use $dom \, f$ is convex} \\ \le & h(\theta g(x) + (1 - \theta)g(y)) & \text{use $g$ is convex/concave, $h$ is nondecreasing/nonincreasing} \\ \le & \theta h(g(x)) + (1 - \theta)h(g(y)) & \text{use $h$ is convex} \\ = & \theta f(x) + (1 - \theta)f(y) \end{align*}\]
--- | - | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Condition | g | convex | concave | convex | concave |
--- | h | nondecreasing | nonincreasing | nonincreasing | nondecreasing |
--- | h | convex | convex | concave | concave |
Result | f | convex | convex | concave | concave |
Example: \(g(x) = x^2 - 1, h(x) = x^{3 / 2}, dom \, h = \mathbb{R}_+\)
then \(dom \, (h \circ g) = (-\infty, -1] \bigcup [1, +\infty]\) is not convex.
Example 3.13:
If \(g\) is convex then \(e^{g(x)}\) is convex. 1
If \(g\) is concave and positive then \(\log(g(x))\) is concave. 4
If \(g\) is concave and positive then \(\frac{1}{g(x)}\) is convex. 2
If \(g\) is convex and nonnegative and \(p \ge 1\), \(g(x)^p\) is convex. 1
\(g : \mathbb{R}^n \to \mathbb{R}^m\), say \(g\) is K-convex, where \(K\) is a cone in \(\mathbb{R}^m\),
if \(dom \, g\) is convex and \(g(\theta x + (1 - \theta)y) \preceq _K \theta g(x) + (1 - \theta) g(t)\)
\(x \succeq _K y \implies h(x) \ge h(y)\) K-nondecreasing.
Example 3.14:
\(h(z) = \log \left(\sum _{i = 1}^k e^{z_i} \right)\), so \(\log \left( \sum _{i = 1}^k e^{g_i(x)} \right)\) will be convex if \(g_1, \cdots, g_k\) are convex.
Minimization: Let \(f : \mathbb{R}^n \times \mathbb{R}^m\) be a convex function, then
\(g(x) = \inf _{y : (x, y) \in dom \, f} f(x, y)\) is convex.
Proof: Let \(x_1, x_2 \in dom \, g\), \(\theta \in [0, 1]\).
then for any \(\varepsilon \gt 0\), \(\exists y_1, y_2\), s.t.
\(g(x_1) \ge f(x_1, y_1) - \varepsilon\), \(g(x_2) \ge f(x_2, y_2) - \varepsilon\) and
\[\begin{align*} g(\theta x_1 + (1 - \theta)x_2) = & \inf _y f(\theta x_1 + (1 - \theta)x_2, y) \\ \le & f(\theta x_1 + (1 - \theta)x_2, \theta y_1 + (1 - \theta)y_2) \\ \le & \theta f(x_1, y_1) + (1 - \theta) f(x_2, y_2) \\ \le & \theta g(x_1) + (1 - \theta) g(x_2) + \varepsilon \end{align*}\]
Example: Let \(C \subseteq \mathbb{R}^n\) be a convex set, then
\(g(x) = \inf _{y \in C} \lVert x - y \rVert\) is a convex function.
Proof: Use \(f(x, y) = \lVert x - y \rVert \; dom \, f = \mathbb{R}^n \times C\),
\[\begin{align*} f(\theta x_1 + (1 - \theta)x_2, \theta y + (1 - \theta)y_2) = & \lVert \theta x_1 + (1 - \theta)x_2 - \theta y_1 - (1 - \theta)y_2 \rVert \\ = & \lVert \theta (x_1 - y_1) + (1 - \theta)(x_2 - y_2) \rVert \\ \le & \theta \lVert x_1 - y_1 \rVert + (1 - \theta) \lVert x_2 - y_2 \rVert \\ = & f(x_1, y_1) + f(x_2, y_2) \end{align*}\]
Consider function \(g(w) = \inf _x \sum _{i = 1}^m w_i (a_i^T x - b_i)^2\), "weighted least square"
concave function of \(w\)
Let \(g(w) = \inf _x (Ax - b)^T W (Ax - b) = \inf _x (x^T A^T W A x - 2b^T W A x + b^T W b)\)
assume \(A^T W A \succ 0\), then optimal \(x = (A^T W A)^{-1} A^T W b\)
(\(\min _x(x^T A x + 2b^T x), \nabla = 2Ax + 2b = 0 \implies \text{best} \, x = -A^{-1} b\))
\[\begin{align*} \text{optimal value} = & b^T W A (A^T W A)^{-1} A^T W b - 2 b^T W A (A^T W A)^{-1} A^T W b + b^T W b \\ = & - b^T W A (A^T W A)^{-1} A^T W b + b^T W b \\ = & \sum _{i = 1}^m b_i^2 w_i - \left(\sum _{i = 1}^m w_i b_i a_i^T \right) \left(\sum _{i = 1}^m w_i a_i a_i^T \right)^{-1} \left(\sum _{i = 1}^m w_i b_i a_i \right) \\ = & \sum _{i = 1}^m b_i^2 w_i - \sum _{i, j} w_i w_j b_i b_j a_i^T \left(\sum _{i = 1}^m w_i a_i a_i^T \right)^{-1} a_j \end{align*}\]
(\(A^T W A = \sum a_i (w_i a_i^T) = \sum w_i a_i a_i^T\))